CVM_2SAMP

Overview

The CVM_2SAMP function performs the two-sample Cramér-von Mises test, a nonparametric hypothesis test used to determine whether two independent samples are drawn from the same continuous distribution. Unlike parametric tests that assume a specific distributional form, this test compares the empirical distribution functions of two samples directly, making it suitable for distribution-free goodness-of-fit testing.

The Cramér-von Mises criterion, first proposed by Harald Cramér and Richard von Mises in 1928–1930, measures the integrated squared difference between two cumulative distribution functions. The two-sample extension was developed by T.W. Anderson in 1962. For theoretical background, see the Cramér-von Mises criterion on Wikipedia.

For two samples X_1, \ldots, X_n and Y_1, \ldots, Y_m, the test statistic is computed based on the ranks of the combined sample:

T = \frac{U}{nm(n+m)} - \frac{4mn - 1}{6(m+n)}

where U = n\sum_{i=1}^{n}(r_i - i)^2 + m\sum_{j=1}^{m}(s_j - j)^2, with r_i and s_j representing the ranks of observations in the pooled sample.

This implementation uses the SciPy cramervonmises_2samp function from the scipy.stats module. The method parameter controls p-value computation: exact enumerates all permutations (suitable for small samples ≤20), asymptotic uses the limiting distribution, and auto (default) selects automatically based on sample size. When ties exist in the data, midranks are used for ranking.

The Cramér-von Mises test offers an alternative to the Kolmogorov-Smirnov test, with the key difference being that it considers the entire distribution rather than focusing on the maximum deviation. This often provides greater sensitivity to differences in distribution tails.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=CVM_2SAMP(x, y, cvm_twosamp_method)
  • x (list[list], required): First sample data as a 2D list. All values are flattened before processing.
  • y (list[list], required): Second sample data as a 2D list. All values are flattened before processing.
  • cvm_twosamp_method (str, optional, default: “auto”): Method to compute the p-value

Returns (list[list]): A 2D list with a single row containing [statistic, pvalue]. str: An error message if input is invalid.

Examples

Example 1: Demo case 1

Inputs:

x y
1 1.5
2 2.5
3 3.5
4 4.5

Excel formula:

=CVM_2SAMP({1;2;3;4}, {1.5;2.5;3.5;4.5})

Expected output:

Result
0.0625 1

Example 2: Demo case 2

Inputs:

x y cvm_twosamp_method
1 1.5 asymptotic
2 2.5
3 3.5
4 4.5

Excel formula:

=CVM_2SAMP({1;2;3;4}, {1.5;2.5;3.5;4.5}, "asymptotic")

Expected output:

Result
0.0625 0.9742

Example 3: Demo case 3

Inputs:

x y cvm_twosamp_method
1 1.5 exact
2 2.5
3 3.5
4 4.5

Excel formula:

=CVM_2SAMP({1;2;3;4}, {1.5;2.5;3.5;4.5}, "exact")

Expected output:

Result
0.0625 1

Example 4: Demo case 4

Inputs:

x y
1 10
2 20
3 30
4 40

Excel formula:

=CVM_2SAMP({1;2;3;4}, {10;20;30;40})

Expected output:

Result
0.6875 0.0286

Python Code

import math
from scipy.stats import cramervonmises_2samp as scipy_cramervonmises_2samp

def cvm_2samp(x, y, cvm_twosamp_method='auto'):
    """
    Performs the two-sample Cramér-von Mises test using scipy.stats.cramervonmises_2samp.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.cramervonmises_2samp.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        x (list[list]): First sample data as a 2D list. All values are flattened before processing.
        y (list[list]): Second sample data as a 2D list. All values are flattened before processing.
        cvm_twosamp_method (str, optional): Method to compute the p-value Valid options: Auto, Asymptotic, Exact. Default is 'auto'.

    Returns:
        list[list]: A 2D list with a single row containing [statistic, pvalue]. str: An error message if input is invalid.
    """
    # Validate x and y are 2D lists
    if not isinstance(x, list):
        return "Invalid input: x must be a list."
    if not isinstance(y, list):
        return "Invalid input: y must be a list."
    if not all(isinstance(row, list) for row in x):
        return "Invalid input: x must be a 2D list."
    if not all(isinstance(row, list) for row in y):
        return "Invalid input: y must be a 2D list."

    # Flatten x and y
    try:
        x_flat = [float(item) for row in x for item in row]
        y_flat = [float(item) for row in y for item in row]
    except (TypeError, ValueError) as e:
        return f"Invalid input: x and y must contain only numeric values. {e}"
    if len(x_flat) < 2 or len(y_flat) < 2:
        return "Invalid input: each sample must contain at least two values."

    # Validate method
    if cvm_twosamp_method not in ['auto', 'asymptotic', 'exact']:
        return "Invalid input: cvm_twosamp_method must be 'auto', 'asymptotic', or 'exact'."

    # Call scipy.stats.cramervonmises_2samp
    try:
        result = scipy_cramervonmises_2samp(x_flat, y_flat, method=cvm_twosamp_method)
        stat = float(result.statistic)
        pvalue = float(result.pvalue)
    except ValueError as e:
        return f"Invalid input for Cramér-von Mises test: {e}"
    except Exception as e:
        return f"Error computing Cramér-von Mises test: {e}"

    # Check for nan/inf
    if math.isnan(stat) or math.isinf(stat) or math.isnan(pvalue) or math.isinf(pvalue):
        return "Invalid result: statistic or pvalue is nan or inf."
    return [[stat, pvalue]]

Online Calculator